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Pursuant to the requirements of 37 C.F.R. §§ 1.821-1.825, Applicant submits the 
enclosed Sequence Listing and computer readable form (CRF). The amino acid sequences 
disclosed in the specification and drawings may be found in computer readable form in file 
010261 .txt on the enclosed diskette and are presented in the paper copy of the Sequence Listing, 
enclosed. 

Applicant hereby certifies that the information recorded in computer readable 
form (CRF) supplied on the enclosed diskette as file 010261.txt is identical to the written 
Sequence Listing. The material presented in computer readable form is not new matter because 
it presents sequences the same as those disclosed in the specification, as filed. 

Applicant believes that the requirements of 37 C.F.R. §§ 1.821-1.825 have been 
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D-UP AMENDED SPECIFICATION PARAGRAPHS 

[0018] If the nucleotide sequence is random, the probability that a sequence of 
given length translated from it will have a particular amino acid sequence can be calculated simply 
by multiplying together the frequencies in the genetic code of the codons encoding each amino acid 
[amino acid] in the sequence. Since some amino acids have as many as six codons and others as 
few as one, the predicted frequency will vary depending on the amino acid sequence itself. Thus 
the sequence LRRLLR. (SEO ID NO: 1) , made up entirely of six-codon amino acids, will appear 

at a frequency of 1 in (6/6 1) 6 , or approximately once in a million codons, and the sequence 

MWWMMW (SEP ID NO: 2) , made up entirely of one-codon amino acids, will appear at a 

frequency of 1 in (1/61) 6 , or approximately once in fifty billion codons. The frequencies of other 

sequences will fall between these two extremes. The important point for us is that even a relatively 
short sequence will appear very rarely, and so if we can determine the amino acid sequence of a 
peptide translated from unknown sequence, we can match it to a portion of the reference sequence 
with high specificity. 

[0030] Comparison of the experimental results with the values in the table 
indicates reveals a match to the predicted mass value for one of the ten candidates - specifically 
the sequence that begins at position 3190 of the reference sequence and proceeds from right to 
left. Retrieval of the reference sequence beginning at position 3190 indicates that the cloned 
sequence begins with "GAATTCTTACACCTCATACTTTCCCAAGCCCCAACTTTCTCATCT 
GAAAATGGTAATAGTATCATCCTTACATGTTTAAGGTCATGAATTGCTAT 
GTGTA (1st 100 nucleotides shown) (SEP ID NO: 3) . The identification is confirmed by 

dideoxy sequencing from a primer 150 nucleotides upstream of the junction between the pUC19 

sequence and the EcoRI fragment. 




[0032] The peptide TMITPSLHACRSTLED (SEO ED NO: 4) , representing the 
N-terminal 16 amino acids of the alpha-complementing factor of beta-galactosidase encoded in 
pUC19 (and also representing the 16 constant N-terminal amino acids in all of the peptides 
described in Example 1 above) is used to raise a polyclonal rabbit antibody using standard 
procedures. 

[0034] The mass spectrum of the immunoprecipitate from the induced cell 
lysate of the clone under examination is observed to contain a distinct peak, at a position 
corresponding to a mass of 8485+3 Daltons, that is not observed in the control. Comparison of 
the experimental results with the values in the table in example 1 above indicates that the insert 
begins at position 9241 of the reference sequence and proceeds from left to right in the Genbank 
sequence. Retrieval of the reference sequence beginning at position 9241 indicates that the 
cloned sequence begins with 

GAATTCACATAAATCGCAAATTTTTTTTTCCTTCCCAGAGCC 
ATCCAAAACTCTGTTTGTCAAAGGCCTGTCTGAGGATACCACTGAAGAGA 

CATTAAAG (1st 100 nucleotides shown) (SEO ID NO: 5) . The identification is confirmed by 

dideoxy sequencing as described in Example 1 . 

[0037] To identify the nucleotide sequence adjacent to the pTriplEx' vector, each 
EcoRI site in the J05584 sequence is identified and ligated, in silico, to the EcoRI site in the 
pTriplEx' vector. For each such in silico construct, the amino acid sequences of the two expected 
hybrid translation products (from each of the start codons in the vector to the first in frame stop 
codons encountered in the insert) are calculated. The mass of each peptide is calculated and all 
10 peptide pairs are tabulated, as shown in the table below. Comparison of the experimental 
results (i.e., peptides of 4255 and 2635 Da.) with the values predicted in the table indicates that 
the insert begins at position 4028 of the reference sequence and proceeds in the forward direction. 
It is concluded that the 5* end of the sequence joined to the vector is 
GAATTCTCTTGGGTT TTGTGGTGTGCTAGACTTAATTACCCATGAATGATTT 
TGTCCTCTTGAG AAAATTTC AAT AGC AC ATCTATT AGTGTTTTTT AT. . . .( 1 st 1 00 



nucleotides shown) (SEO ID NO: 6V The identification is confirmed by dideoxy sequencing 
from the plasmid using a primer 150 nucleotides 3' to the pTriplEx* EcoRI site. 



Position of EcoRI site Orientation in pTriplEx' Start Codon Predicted Peptide Mass 

3190 forward 1st 6137 

3190 forward 2nd 5707 

3190 reverse 1st 6278 

3190 reverse 2nd 3891 

4208 forward 1st 4255 

4208 forward 2nd 2635 

4208 reverse 1st 19748 

4208 reverse 2nd 3905 

6066 forward 1st 3595 

6066 forward 2nd 3606 

6066 reverse 1st 6401 

6066 reverse 2nd 1 363 

9241 forward 1st 3583 

9241 forward 2nd 7122 

9241 reverse 1st 4582 

9241 reverse 2nd 1746 

9543 forward 1st 5306 

9543 forward 2nd 1477 

9543 reverse 1st 9906 

9543 reverse 2nd 2516 

The mass values above are computed by translating each hypothetical fusion 

polypeptide without the N-terminal methionine that is removed in vivo in E. colL 



[0040] Two oligonucleotide primers are synthesized using standard methods. In 
one, CCC GAATTC AGCAGGTAAAAATCAAGG (SEP ID NO: 7\ the first 10 nucleotides 
contain an EcoRI site (underlined) and last 17 nucleotides correspond to the first 17 nucleotides 
of exon 2 of the human nucleolin gene. The other, 
GGG GAATTC TTACTCTTCTCCACTGCTAT (SEP ID NO: 8\ the last 17 nucleotides 
correspond to the reverse complement of the last 17 nucleotides of exon 2, followed immediately 
(in the sense orientation of the oligonucleotide) by the stop codon TAA and a sequence that 
includes an EcoRI site (underlined). 

[0044] The program was run with the 24 nucleotide input sequence 
CAACTAGAAGAGGTAAGAAACTAT (SEP ED NO: 9) . Two reading frames were selected; 
the forward reading frame beginning with the first nucleotide (Fl) and the reverse (antisense) 
reading frame beginning with the second antisense nucleotide (R2). The results are shown below. 

[begin] 

Enter Sequence: 

[input] CAACTAGAAGAGGTAAGAAACTAT (SEP ID NO: 9) 

[output] Protein: QLEEVRNY (SEP ID NO: 10) 

Which reading frames would you like to examine? 

1: Forward (Fl) 

2: Forward; first base removed (F2) 
3: Forward; second base removed (F2) 
4: Reverse (Rl) 

5: Reverse first base removed (R2) 

6: Reverse second removed (R3) 
[input] 1,5 
[output] MASS DIFFERENCES 



Location Mutation Frame Fl 



Frame R2 



None 1032.13 722.89 

/A(K) 0.04 0.00 

C-{G(E) 0.99 0.00 

\T(Z) -1032.13 0.00 

/G(R) 28.06 0.00 

(Q)A-{T(L) -14.97 0.00 

\C(P) -31.01 0.00 

/G(Q) 0.00 0.00 

A-{ T(H) 9.01 0.00 

\C(H) 9.01 0.00 

/A(I) 0.00 276.34 

C-{ G(V) -14.03 276.34 

\T(L) 0.00 0.00 

/C(P) -16.04 299.37 

(L)T-{A(Q) 14.97 226.32 

\G(R) 43.03 200.24 

/G(L) 0.00 241.29 

A-{T(L) 0.00 241.33 

\C(L) 0.00 242.28 

/T(Z) -790.84 -34.02 

G-{ C(Q) -0.99 -34.02 

\A(K) -0.95 0.00 

/G(G) -72.07 -60.10 

(E)A-{T(V) -29.99 16.00 

\C(A) -58.04 -44.04 

/G(E) 0.00 -34.02 

A-{ T(D) -14.03 -34.02 

\C(D) -14.03 -48.05 



/T(Z) 



-661.72 



0.00 



10 G-{C(Q) 

\A(K) 
/G(G) 

11 (E)A-{T(V) 

\C(A) 
/T(D) 

12 G-{ C(D) 
\A(E) 



-0.99 
-0.95 
-72.07 

-29.99 
-58.04 
-14.03 
-14.03 
0.00 



0.00 
0.00 
-16.04 

23.98 
43.03 
0.00 
-14.03 
34.02 



/T(L) 

13 G-{ C(L) 
\A(I) 
/C(A) 

14 (V)T-{A(E) 

\G(G) 
/G(V) 

15 A-{T(V) 
\C(V) 



14.03 

14.03 
14.03 
-28.05 

29.99 
-42.08 
0.00 

0.00 
0.00 



-423.52 
-423.52 
0.00 
-60.04 

-16.00 
-76.10 
-26.04 

-49.08 
-48.09 



16 



/G(G) 
A-{T(Z) 
\C(R) 
/T(I) 



-99.14 
-433.47 
0.00 
-43.03 



17 (R)G-{C(T) -55.09 

\A(K) -28.02 
/G(R) 0.00 

18 A-{T(S) -69.11 
\C(S) -69.11 



0.00 
0.00 

0.00 
76.10 

16.06 
60.10 
10.04 

14.02 
-16.00 



19 



/G(D) 

A-{T(Y) 

\C(H) 



0.99 

49.08 
23.04 



0.00 

0.00 
0.00 



/G(S) 

20 (N) A-{ T(I) 

\C(T) 
/A(K) 

21 C-{ G(K) 
\T(N) 



-27.02 

-0.94 
-13.00 
14.07 

14.07 
0.00 



-28.05 

15.96 
-42.08 
48.05 

14.03 
14.03 



/C(H) 
\G(D) 

22 T-{A(N) 
/G(C) 

23 (Y) A-{ T(F) 

\C(S) 
/C(Y) 

24 T-{A(Z) 
\G(Z) 



-26.04 
-49.08 

-48.09 
-60.04 

-16.00 
-76.10 
0.00 
-163.18 
-163.18 



18.03 
0.00 
0.00 
-12.06 

15.01 
43.03 
-14.03 
0.00 
0.00 



Enter the detection threshold: 

[input] • 0.8 Dalton. 

[output] Undetectable amino acid substitutions: l.(Q)C-A(K) 

[0048] The sequence of exon 2 of the human rds/peripherin gene (Genbank 
accession M73531) is shown below. Intron sequence is shown in lower case; exon sequence in 
upper case. 

gggaagcccatctccagctgtctgtttccctttaagTCGAATCAAGAGCAACGTGGATGGGCG 
GTACCTGGTGGACGGCGTCCCTTTCAGCTGCTGCAATCCTAGCTCGCCACGGCCCTGC 
ATCCAGTATCAGATCACCAACAACTCAGCACACTACAGTTACGACCACCAGACGGAG 
GAGCTCAACCTGTGGGTGCGTGGCTGCAGGGCTGCCCTGCTGAGCTACTACAGCAGCC 
TCATGAACTCCATGGGTGTCGTCACGCTCCTCATTTGGCTCTTCGAGgtaggccctgggcagctg 
ggggtagagggtaaggagagcctcc (SEP ID NO: 1 1) 



[0049] Two primers, of sequences 

GGCCCGGAATTCTCCAGCTGTCTGTTTCCCTTTAAG (SEP ID NO: 12) and 
AATTTACTCGAGCTACCCCCAGCTGCCCAGGGCCTAC (SEP ID NO: 13) were synthesized and 
used to PCR amplify rds/peripherin exon 2 from an individual known to carry a wild type allele of 
rds/peripherin. The amplicon was cut with EcoRI and Xhol and cloned into the EcoRI/XhoI sites of the 
pGEX derivative described in Nelson et al. The resulting plasmid was cut with Xho 1, treated with 
Klenow fragment of DNA polymerase, and self-ligated to produce a construct expected to produce a 
fusion protein with the sequence shown below. 

MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLE 

FPNLPYYIDGDVKLTQSMAIIRYlADKIiNMLGGCPKERAEISM 

DFETLKVDFLSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLD 
AFPKLVCFKKRIEAIPQrDKYLKSSKYIAWPLQGWQATFGGGDHPPKSDLIEGRGIQDLVPH 
TTPHHTTPHHTTPHHTTPQDLNSPAVCFPLSRTKSNVDGRYLVDGVPFSCCNPSSPRPCIQY 
QITNNSAHYSYDHQTEELNLWRGCRAALLSYYSSLMNSMGVVTLLIWLFEVGPGQLGV 
ARSSGRTVTD (SEP ID NO: 14) 

[0053] The amplicons described in the previous example are reamplified using 
the upstream primer 

5'GGATCCTAATACGACTCACTATAGGGAGACCACC ATG CATCACCATCATCACCATCA 
CCACTCTCCAGCTGTCTGTTTCCCTTTAAG (SEP ID NO: 15) and the downstream primer 
5' CTTAGTCATTATACCCCCAGCTGCCCAGGGCCTAC (SEP ID NP: 16) . The upstream 
primer contains a T7 promoter followed by a translation initiation sequence (start codon 
underlined) followed by a sequence encoding eight histidines followed by sequence identical to 
the red/peripherin sequence immediately 5' to rds/peripherin exon 2. The downstream primer 



contains two stop codons (in antisense orientation) preceding the sequence complimentary to the 
sequence just 3' to red/peripherin exon 2. 

[0061] Because the primers are all anchored by non-T nucleotides at their 3' 
ends, only three of them will prime a given cDNA sequence. In the case of the hemoglobin 
alpha 2 transcript, which ends in the sequence 

GCGGCAAAAAAAAAAAAAAAAAAAAAAA. . ., (SEP ED NO: 17) the primers that are 
extended are those ending in G. 



